State Transition Constraints

About two decades ago, I introduced the concept of transition constraints to show Data Validation in a database is a lot more complex than seeing if a string parameter really is an integer. In October of 2008, I did an article called Constraint Yourself! on how to use DDL constraints to assure data integrity. One of the topics in that piece was a look at state transition constraints via an auxiliary table.

Introduction

Let me give you some introduction, and for some of you flashbacks to your early computer science classes. There is an initial state, flow lines that show what are the next legal states, and one or more termination states. These diagrams also called finite state automata, and you’ll see it in freshman computer science classes. To date, we might implement them with a graph database. What is worth relearning are our old tricks from SQL.

This state transition diagram was deliberately simplified, but it is good enough to explain principles. To keep the discussion as simple as possible, my table is for only one person’s marital status over his life. Here is a skeleton DDL with the needed FOREIGN KEY reference to valid state changes and the date that the current state started.

CREATE TABLE MyLife (

previous_state VARCHAR(10) NOT NULL,

current_state VARCHAR(10) NOT NULL,

CONSTRAINT Improper_State_Change

FOREIGN KEY (previous_state, current_state)

REFERENCES StateChanges (previous_state, current_state),

start_date DATE NOT NULL PRIMARY KEY,

--etc.

);

What is not shown on it are which nodes are initial states (in this case “Born”) and which are terminal or final states (in this case “Dead”, a very terminal state of being). A terminal node can be the current state of a middle node, but not a prior state. Likewise, an initial node can be the prior state of a middle node, but not the current state. I did not write any CHECK() constraints for those conditions. It is easy enough to write a quick query with an EXISTS() predicate to do this and I will leave that as an exercise for the reader. Let’s load the diagram into an auxiliary table with some more constraints.

CREATE TABLE StateChanges (previous_state VARCHAR(10) NOT NULL,

current_state VARCHAR(10) NOT NULL,

PRIMARY KEY (previous_state, current_state),

state_type CHAR(1) DEFAULT 'M' NOT NULL

CHECK (state_type IN ('I', 'T', 'M')), /*initial, terminal, middle*/

CONSTRAINT Node_type_violations

CHECK (CASE WHEN state_type IN ('I', 'T')

AND previous_state = current_state

THEN 'T'

WHEN state_type = 'M'

AND previous_state <> current_state

THEN 'T' ELSE 'F' END = 'T')

);

INSERT INTO StateChanges (previous_state, current_state, state_type)

VALUES ('Born', 'Born', 'I'), -- initial state

('Born', 'Married', 'M'),

('Born', 'Dead', 'M'),

('Married', 'Divorced', 'M'),

('Married', 'Dead', 'M'),

('Divorced', 'Married', 'M'),

('Divorced', 'Dead', 'M'),

('Dead', 'Dead', 'T'); -- terminal state

The Time Dimension

An aspect of this problem that I did not consider in the first article is the time dimension. We want to see a temporal path from an initial state to a terminal state. State changes do not happen all at once, but are spread over time. An acorn becomes an oak tree before it becomes lumber and finally my chest of drawers. The acorn does not jump immediately to being a chest of drawers.

Some of the changes are controlled by time. For example, one cannot get married immediately after being born, they have to wait to be of legal age. A business offer can expire in a set number of days. I am sure you can think of many additional examples as well.

For a production system, you would need a more complete set of temporal columns to guarantee that we have no gaps in the history, but this will do for now. We now need a stored procedure to add data to the MyLife table. Here is one solution which is deliberately broken into clear steps for clarity.

CREATE PROCEDURE Change_State

(@in_change_date DATE,

@in_change_state VARCHAR(10))

BEGIN

DECLARE @most_recent_state VARCHAR(10);

SET @most_recent_state

= (SELECT current_state

FROM MyLife

WHERE start_date

= (SELECT MAX(start_date) FROM MyLife));

/* insert initial state if empty */

IF NOT EXISTS (SELECT * FROM MyLife)

AND @in_change_state

IN (SELECT previous_state

FROM StateChanges

WHERE state_type = 'I')

BEGIN

INSERT INTO MyLife (previous_state, current_state, start_date)

VALUES (@in_change_state, @in_change_state, @in_change_date);

RETURN;

END;

/* must be a real state change */

IF @in_change_state = @most_recent_state

BEGIN

RAISERROR ('This does not change the state.', 16, 1);

RETURN;

END;

/* must move forward in time */

IF @in_change_date <= (SELECT MAX(start_date) FROM MyLife)

BEGIN

RAISERROR ('Violates time sequence.', 16, 1);

RETURN;

END;

INSERT INTO MyLife (previous_state, current_state, start_date)

VALUES (@most_recent_state, @in_change_state, @in_change_date);

END;

The first block of code locates the most recent state of my life, based on the date. The second block of code will insert an initial state if the table is empty. This is a safety feature but there probably ought to be a separate procedure to create the set of initial states. The new state has to be an actual change, so there is a block of code to be sure. The changes must move forward in time. Finally, we build a row using the most recent state as the new previous state, the input change state and the date. If the state change is illegal, the FOREIGN KEY is violated, and we get an error.

If you had other business rules, you could also add them to the code in the same way. You should have noticed that if someone makes changes directly to the MyLife Table, they are pretty much free to screw the data. It is a good ideas to have a procedure that checks to see that MyLife is in order. Let’s load the table with bad data:

INSERT INTO MyLife (previous_state, current_state, start_date)

VALUES ('Born', 'Married', '1990-09-05'),

('Married', 'Divorced', '1999-09-05'),

('Married', 'Dead', '2010-09-05'),

('Dead', 'Dead', '2011-05-10'),

('Dead', 'Dead', '2012-05-10');

This poor person popped into existence without being properly born , committed bigamy and died twice. And you think your life is tough! Here is a simple validation procedure to catch those errors.

CREATE PROCEDURE ValidateLife

WITH Sequenced_State_History

(SELECT previous_state, current_state,

ROW_NUMBER() OVER (ORDER BY start_date)

AS change_seq

FROM MyLife)

/* There is chain of links from the initial state to the current state */

SELECT 'Missing link(s) in History'

FROM Sequenced_State_History AS H1, Sequenced_State_History AS H2

WHERE H1.change_seq + 1 = H2.change_seq

AND H1.current_state <> H2.previous_state

UNION ALL

/* has one and only one initial state */

SELECT 'No unique initial state.'

FROM MyLife AS M, StateChanges AS C

WHERE C.state_type = 'I'

AND M.previous_state = C.previous_state

AND M.current_state = C.previous_state

HAVING COUNT(*) <> 1

UNION ALL

/* has zero or one terminal state */

SELECT 'Too many terminal states.'

FROM MyLife AS M, StateChanges AS C

WHERE C.state_type = 'T'

AND M.previous_state = C.previous_state

AND M.current_state = C.previous_state

HAVING COUNT(*) > 1;

The CTE numbers the steps of the temporal path from an initial node to a middle or terminal node. This chain has to be unbroken which means going from step (n) to step(n+1) has to be a legal change in the StateChanges table. This chain can have only one initial node, so let’s check for that next. Finally, the chain is either still in progress or it has reached a single terminal node.

A little note on the programming technique used. The union of separate queries to do one validation at a time can often be made faster by combining some of the queries. But there are trade-offs; this code is easy to read and to maintain and (hopefully) it will not be run often. It is also hard to get error messages from a single statement. Look back at the ChangeState() procedure; the two IF and RAISERROR() blocks of code could have been converted into CASE expressions that will generate NULLs, folded into the INSERT INTO statement and cause the insertion to fail.

INSERT INTO MyLife (previous_state, current_state, start_date)

VALUES

(NULLIF (@in_change_state, @most_recent_state),

@in_change_state,

CASE WHEN @in_change_date

<= (SELECT MAX(start_date) FROM MyLife)

THEN NULL ELSE @in_change_date END);

This is not easy to read or to get error messages that tell you if the @in_change_date is invalid in that it violates the time sequence.

The Temporal Side of Changes

What is still missing is the temporal aspect of state changes. In this example, the (Born’, ‘Married’) change would have to deal with the minimum age of consent. The (Married’, ‘Divorced’) change often has a legal waiting period. While technically a business rule, you know that no human being has lived over 150 years, so a gap that size is a data error. The terminal and initial states are instantaneous, however. Let’s add more flesh to the skeleton table:

CREATE TABLE StateChanges

(previous_state VARCHAR(10) NOT NULL,

current_state VARCHAR(10) NOT NULL,

PRIMARY KEY (previous_state, current_state),

state_type CHAR(1) DEFAULT 'M' NOT NULL

CHECK (state_type IN ('I', 'T', 'M')), /*initial, terminal, middle*/

state_duration INTEGER NOT NULL -- unit of measure is months

CHECK (state_duration >= 0),

CONSTRAINT Node_type_violations

CHECK (CASE WHEN state_type IN ('I', 'T')

AND previous_state = current_state

THEN 'T'

WHEN state_type = 'M'

AND previous_state <> current_state

THEN 'T' ELSE 'F' END = 'T')

);

To make up some data, let’s assume that the age of consent is 18 (12 months * 18 years = 216), that you have to wait 3 months into your marriage before getting a divorce, and that you must be divorced 2 months before you can re-marry. Of course, you can die instantly.

First I will clear the data from the table, add a column for the required months between actions, and then reload the table.

--drop constraint that references StateChanges

ALTER TABLE MyLife DROP CONSTRAINT Improper_State_Change;

--clear out domain data

DELETE FROM StateChanges;

--add the new column for the state duration requirements

ALTER TABLE StateChanges ADD state_duration INTEGER NOT NULL;

--reload table

INSERT INTO StateChanges (previous_state, current_state,

state_type,state_duration)

VALUES ('Born', 'Born', 'I', 0), -- initial state

('Born', 'Married', 'M', 216),

('Born', 'Dead', 'M', 0),

('Married', 'Divorced', 'M', 3),

('Married', 'Dead', 'M', 0),

('Divorced', 'Married', 'M', 2),

('Divorced', 'Dead', 'M', 0),

('Dead', 'Dead', 'T', 0); -- terminal state

--add back the constraint

ALTER TABLE MyLife

ADD CONSTRAINT Improper_State_Change

FOREIGN KEY (previous_state, current_state)

REFERENCES StateChanges (previous_state, current_state)

The first question is where to check for temporal violations; during insertion or with validation procedures? My answer is both. Whenever possible, do not knowingly put bad data into a schema so this should be done in the ChangeState() procedure. But someone or something will subvert the schema and you have to be able to find and repair the damage.

Here is a query which will tell you what state change in the chain has an improper duration and what the disagreement is.

WITH Sequenced_State_History

(SELECT previous_state, current_state, start_date,

ROW_NUMBER() OVER (ORDER BY start_date) AS change_seq

FROM MyLife)

/* There is chain of links from the initial state to the current state */

SELECT H2.change_seq, H2.previous_state, H2.current_state,

DATEDIFF (MM, H1.start_date, H2.start_date) AS actual_state_duration,

C.state_duration AS expected_state_duration

FROM Sequenced_State_History AS H1,

Sequenced_State_History AS H2,

StateChanges AS C

WHERE H1.change_seq + 1 = H2.change_seq

AND DATEDIFF (MM, H1.start_date, H2.start_date) <= C.state_duration

AND C.previous_state = H2.previous_state

AND C.current_state = H2.current_state;

Inserting a new life change is not a simple matter of putting a (previous_state, current_state, start_date) row into the table. To do it right, you can put conditions into the INSERT INTO statement to cause errors when there is bad data.

CREATE PROCEDURE Life_Status_Change

(@in_change_state VARCHAR(10),

@in_most_recent_state VARCHAR(10),

@in_change_date DATE)

INSERT INTO MyLife (previous_state, current_state, start_date)

VALUES

(NULLIF (@in_change_state, @in_most_recent_state),

@in_change_state,

CASE WHEN @in_change_date

<= (SELECT MAX(start_date) FROM MyLife)

THEN NULL ELSE @in_change_date END);

A slightly different model will keep a (start_date, expiry_date) pair in the history table. In the case of the MyLife example, the durations were the minimum values for certain changes. But a lot of commercial situations have a fixed lifespan. Warranties, commercial offers and bids expire in a known number of days. This means adding another column to the StateChanges table that tells the insertion program if the expiration date is optional (shown with a NULL) or mandatory (computed from the duration).

Here is some skeleton DDL for a bid application to explain this better.

CREATE TABLE MyBids

(bid_nbr INTEGER NOT NULL,

previous_state VARCHAR(10) NOT NULL,

current_state VARCHAR(10) NOT NULL,

CONSTRAINT Improper_State_Change

FOREIGN KEY (previous_state, current_state)

REFERENCES StateChanges (previous_state, current_state),

start_date DATE NOT NULL,

expiry_date DATE, -- null means still open.

CHECK (start_date <= expiry_date),

PRIMARY KEY (bid_nbr, start_date),

--etc.

);

The DDL has a bid number and the start_date as the primary key and a new column for the expiration date. Obviously, the bid has to exist for a while, so add a constraint to keep the date order right.

The next DDL will extend the state change table to be a bit more flexible (if you wish to test this out, the process to recreate this would be the same as in the previous section).

CREATE TABLE StateChanges

(previous_state VARCHAR(10) NOT NULL,

current_state VARCHAR(10) NOT NULL,

PRIMARY KEY (previous_state, current_state),

state_duration INTEGER NOT NULL,

duration_type CHAR(1) DEFAULT 'O' NOT NULL

CHECK ('O', 'M')), -- optional, mandatory

- etc.

);

The DDL for the state changes gets a new column to tell us if the duration is optional or mandatory. The insertion procedure is a bit trickier. The VALUES clause has more power than most programmers use. The list can be more than just literal values or simple scalar variables. But using CASE expressions lets you avoid if-then-else procedural logic in the procedure body.

All it needs is the bid number and what state you want to use. If you don’t give me a previous state, I assume that this is an initial row and repeat the current state you just gave me. If you don’t give me a start date, I assume you want the current date. If you don’t give me an expiration date, I construct one from the StateChanges table with a scalar subquery. Here is the skeleton DDL for an insertion procedure.

CREATE PROCEDURE Bid_Status_Change

(@in_bid_nbr INTEGER,

@in_previous_state VARCHAR(10),

@in_current_state VARCHAR(10),

@in_start_date DATE,

@in_expiry_date DATE)

INSERT INTO MyBids (bid_nbr, previous_state, current_state, start_date, expiry_date)

VALUES (@in_bid_nbr, -- required

COALESCE (@in_previous_state, @in_current_state),

@in_current_state, -- required

COALESCE (@in_start_date, CAST (CURRENT_TIMESTAMP AS DATE),

(SELECT COALESCE (@in_expiry_date,

DATEADD(MM, @in_start_date, S.state_duration))

FROM StateChanges AS S

WHERE S.previous_state =

COALESCE (@in_previous_state, @in_current_state)

AND S.current_state = @in_current_state

AND S.duration_type = 'M'))

);

There are other tricks to assure that there are no gaps between the rows using DDL, but that is another article for another day.

Register for Simple Talk

State Transition Constraints

Introduction

The Time Dimension

The Temporal Side of Changes

Article tags

About the author

Joe Celko

Joe's contributions

Articles

Books

Top topics

Joe's latest contributions:

Data Encoding Schemes

Data Elements on a Nominal Scale

The CREATE DOMAIN Statement

Introduction

The Time Dimension

The Temporal Side of Changes

Article tags

Recommended

About the author

Joe Celko

Joe's contributions

Articles

Books

Top topics

Joe's latest contributions:

Data Encoding Schemes

Data Elements on a Nominal Scale

The CREATE DOMAIN Statement